DOCO-BBNT BIS on 1 



tt 1B9 166 

AOTMOP 
TITLE 



INSTITUTION 
POE CAT! 
NOTE 



• TM BOO 3111 

Hambleton, Ponnld K.: ?^fflor., ''oNjrt A. 

Steps for COTistructinq Cruterion-Fj^f eri^nced Testh. 

Inboratorv of Psychometric and Evaluative Keseirch 

Peport No. 10U\ , ' * 

Massachusetts Univ. » fitnherst. S-chOol ot Education.' 

Upr BO. ' 

6Up, : Paper presented the Annual Meeting of the 
Americaji Educational Peseatch A.seociation (6Uth, 
Boston, ha/ April 7-1 1 , . .1 9B0) . 



EOFS PRICE 
DESCBIPTOBS 



IDJNTIFIERS 



MF01/FC03 Plus Postaqe. ^ 
♦Criterion Referenced ^ests 
'♦Guidelines: Scoring: ♦Test 
Format: Testing Prcblefts- 
Domain Specification: ♦Test 



RBSTHACT 
tfests is often 



Cutting Sco'res; 
Construction: ♦Test 

t 

Content: Te§t Manuals 



The sublect of constructing criterion-referenc^d 
researched, but many technical- problems remain to be 
satisf actoirily resolved- Foremost, criterion-referenced test 
developers need a comprehensive set of steps for construction. In 
thiF papery 1U logical .steps for building criterion -referenced tests 
that refer tc several different applications and. allov for objective 
and npn-obiective formats are* offered: l) preliminary consid^ratiq^s: 

cf possible content: 3) preparation of domain 

review of. domain specifications; 5) "^additional 
preparation of test content; 7) preparation of 

test materials review: 9) -^compilation of final 
determination of standards: 11) p^reparation of 
preparation of technical manual; 1 3| publication of 
test:-' and 1^1) collection of technical data. Four significant 
contributions of the steps are: 1) use of a priori 'methods for 
validation: ' 2). allowance for use of obiectiye/ndn-ob jective test 
formats:. 31 flexibility of steps for use, in 'distinct situations 
fclassroom: • district/state: state and .national) ; and M) 
ccnprehen'siveness of steps. I"n addition to the steps, a discussion of 
ratipnal^ for inclusion of each step and' guidelines for . > 
impiementaticn are provided. (GSK) 



2) identification 
specifications; 1) 
test* planning: 6) 
scoring method: B) 
form of test : 10) 
report f^rirs; 1 2) 



♦***♦♦****♦♦#*♦♦******♦**♦***♦*************** 

* Eeproductiohs supplied by EDRS are the best> that can be made * 
■ from the original document. * 



/ 



3/31/80 ^ 

U» OI?rA«TMI!NT or M«Al TM, 
KOUC ATlpN * WKl rA*« 



HATIONA.L INSTlTUTIfi OF 
^UCATION 



••PERMJ8S10N TO RtPHOOOCE THIS 

MATERIAL MAS BEEN GRMTTED BY ^^^^^ oocuMCf^T has «l=fN «fPiiO. 

/) I I ^L/^'TAa/ 1 J J n^r 1 T Ducro tXACTl Y A-, RUCeiVtD PROM 

K H rt^Al p CPA/ Steps 'for Constructing Criter lon-Ri?f erenced leate'" THt peht^onow o^oANtiATioNowioiN. 

JL\^ i n IM r C7 ATiNC. n PO'NTs or VlF.W OR OPtNUjNS 

/ ! ■ ♦ , STATI n DO NOT NtCI SSAttU.Y ttf MRC- 
1— - . * SfNl Of HC lAl NATIONAL iNSTlTUTEOF * 

' - t AT lON POSI nON OW POl ICY 

, \ ' Uonald K. fJambleton (xnd Robert A. Simon 

TO THE EDUCATIONAL RESOURCES 
'information center (ERIC). 



University of Massachusetts , Amherst 



CO 

CO 

LjU 



I 

• I 

ERJC 



4 

) 

Glaser (1963) and Popham and Husek (1969) were the first researchers^ 
to make the case for criterion- referenced tests • Popham and Husek also 
offered a set of methods and procedures for constructing criterion- ^ 

referenced tests and Interpreting test scores. Since the pioneering work \ 

■ > 

o( Popham and Husek In 1969 » there have been hundreds of research papers 

written about fechnlcal matters associated with building criterion- i 

referenced tests. Fdr example, the psychometric literature abounds with 

papers which considep such topics as (1) writing objectives,^ (2) prepar-- 

ing ^nd validatitig test items, (3) determining test lengths,/ (4) select- 

Ing test items, (5) assessing the reliability and validity of test scores 

and decldions, and evaluating tests. Berk (1980), Hambleton, 

1 

Swaminathan, Algina, and Coulson (1978), and Popham (i978) offer reviews 
' of many these contributions. * 

Of course many technical problems remain to be satisfactorily 



res 



ojLvecl. For tone, criterion-referenced test developers-'need a compre.- 

^ \/ / 

hensive set of* steps for building criterion-referenced tests. The ayail- 

■ ' • ■ ^ • . ■ / ■ 

ability^ of a set of steps would increase the likelihood that test ^/ 

developers would consider all of the proper steps and carry them oi^t 

in th^ correct sequence. Unfortunately, current mode 5,8 for critefjlomr . 

referenced tes-t develojiment ha\?e sefveral shortcomitigs . One shortGoming 



^ Laboratory of Psychometric and Evaluative Research ReportjNo. 104 
Amherst , MA; School pf Education, University of Massachusetts, ,1080. 

paper preseAted at the annual meeting *of the American Educa- 
tional Reseaorch Association, Boston, 1980* 



/ 



Is that they emphaslz^ the building of tests which use tnultiple--choice, 
t;ruG-^false^ or matching questions^ (Hanibleton & pignor, 1979n, 19 79b; 
Millroan, 1974; Popham, 1978) ♦ A common critiqisil^of criterion-referenced 
tests (or basic sicills tests, competency tests, or minimum competency ) 

tests, as -they are sometimes called) is that there is almost a total 

t 

reliance on objective formats and therefore the tests are limited in 
the skills they can measure ♦ Many Important skills such as writing 
and speal^ing can be measured better (and sometimes only) through the 
use of , essays, observational methods, and simulations, to r^ame Just 
three non-objective ttein formats* i , ^ 1^ 

Reliance on objeictive tefet items is due to the (relative) ease 
with which th^y can be written and administered, tcr the convenient way t 
in which they can be scored, and to the lack of experience among test 

■ / ' • - 

developers in using formats for test data collections such as observations 

simulations, and work-samples. But, criterion- referenced tests need 

Tiat consist solely of objective test items. For example. National Assess'- 

: ■ .. ■ • ■ :1 > 

ment of Educational Progresj^i uses a variety of item types in order to 
proyide useful information about the ijuality of American scl^ols. If 
criterion- referenced testing programs are to achieve their full potential, 
more use must be made on non-objective formats so that. sVills such as ^ 
writing checks, utilizing the resources of a library, anA^ prej^Ang a 
resume can be assessed/ . ' ' • 



Andther shortcoming of available models for test development 
is that they are often specific to part,icular applications. *^lt would, 
be highly desirable to have a list of steps which is broad enough , toj 
guide the prepjjiration (1) of tests at the classroom level (for diagnosis 
and mpnitoring studeVit progress), (2) of tests at the district and 
"State level (for program evaluation and remediation) and (3) of tests 
at the state* and national level for .use in certification and licensure* 

It seems clear then that there is ^ definite need for a 
compi^ehcnsive set of steps foi^ building criterion- referenced test^. 
Also,\ it seems unnecessarily restrictive ^ offer a set of steps which^ 
are Ifanited to a particula»r format or to a particular ap||licat ion. In 
this p^per a set of logical steps for building criterion^ref erenced 
tests t^jat apply to several common (but different) applications and 
allow for both objective and non-db;! ect ive ' formats will be of£ered. 
♦Th^ steps\Vepresent a combination and ^extension qf prior work' by 
Tinkleman\(l971), Osborne (1973) » McK^egan (undated), Sanders and 
Sachse (19/5) and Hamhleton and Elgnor (1979a). Four significant 



contribution's of the ^teps are: 

1. T^e use of a pHorl methods ^to validate /the test blueprint < 

2. The allowance for /the tise of both objective and rtbn- 
objective t^st formate by placing the format decision « 
Inlits proptfhr^ postttPl^ in the sequence* , 

\ W ' - ^' 

. 3. The\ flexibility of the steps for use in three relatively 
distinct situation?, 1.6., classroom tests, large scale 
assi^ssment and occupational/professional licensure and 
cert oLficat ion examinations. ' * ^ 

^ 4. The dliomprehendlveness of the steps in > that they cover tihe 
entire process of tekt development and validation per- 
taining to the asse^Qment of both knowledge and skills • 



{ 

L 



-4- 

Constructing Criterlon-Ref etrenced Tea^ts 

In this section of the paper a set of lA steps wilil be intro- 

duced along with a brief -discussioiv^pi caeh step* The lA step model 
♦ 

is presented in Figure !• In most Instancee^ the outline is sufficiently 

•» ■ * 

\ ■ ' ' 

descriptive so elaboration in the text is minimized^ The tfext consists 

.A . ' ' ' 



* ''primarily of points which need'elaboration and additional comments con- 



ceming some aspects of the steps. 



1. Preliminary Considerations in Preparing a Test * 

The first ^tep is essential to keep the process ^focused in a use- 
ful direction* A committee which represents those groups which have 

^s^me* respotisibllit^y for*th^ test should be fprmed to oversee »the test 

■ . \ ^ 

development process. The committee should address itself to matters 

■\ . ■-. ^ ' ^ ' 

such as : * ^ 

1. '"the .purpose (s) of the lest 
>l ' 2* the group(s) to be assessed v 

3. identification of/recipients of tes*t score ^information and 

v> ' . . . • 

how they will use the information ■ ^ * 

A* the conteivt ateas (specified in general terms) which will 

. be covered by the test , . 

5* the test ^ength specified iij^terms of the approxiin|^te time 
' * available to adminlf 
6* ^'the amount of time,- money, expertise, and personnel available 
/ to'^arry out r the test development process ^ " 

/ 7l a timeline for t^st development, axyi assign pdbple and 
V resources to assure completion of each step* ^ 



iciixea inr Len 

I c 

Lster thf^\-ta6t 



•■'V •••• • . ■ " . ■■■■ . ■■ ■ ■ . - ^ 



Figure 1. Steps for construct In-j^ dtiterion-ref erenced , tests 



1. Prellminag^ considerations In preparing a test - 

a. State thd^ p^rpose(s) of the tfest 

i. Classroom (for example, diagnosis^ description, or 
Inst rurt lonal decis lon-tnaklng) 

11* .Lai::#e Scale Assessment (for. eXiftmple^ profiXAm iJvaluatipn, 

or student remediation) 
111. Certll^lcat Ion and Licensure (for example^' awarding of high 
school (fiplomas, o-r controlling entry Into occupations 
and prof-tessions) 

b. Identify the group (s) to be assessed and the groups who will 
'receive test score Informatlpn * . 

c> Specify the content area to be covered and the approximate test 
administration time (or test length) 

d. Specify thi^ amount of time, money, and expertise available to 
complete the test development project 



e. Prepare a list of actlvlt^les, attach deadlines Jiand assign people 
and resources ' . 

2. Identification of possible content for Inclusion In a tesuV 

a. Form a committee of Irjdivlduals* to carry out ^t^e required work 
b» Prepare a Vlrst draft of the! content listing of specific 

'S QTC toj 



behaviors at topics is ^desired) . ^ ^ ^ 

f • Classroom 

'•build- Jfrom the present curriculum atfid what is currently 



/ 



taugj^t 



a/s 



/ 
/ 



1^. Large Scale Assessment \ ^ ^/ 
•review curricula and textbooks 7 
•involve ^Individuals with an interest in the scope/ and 

direction of the test (f or^example^ parents, cbmmunity ^ 
leaders,^ legislators, school board members > qurrlcurum 
spec^galists, principals, teachers^ and students^ 
ill, Certlf icjatlon and Licensure . - 

•prepare an initial list of jobs *and associated responsl-* 
billties and fifnctions (and pQssibly specify acttyities, 
V^nowledge^ and skills at this tlm^ as well) 
•complete the list of jobs, responsibilities, etq^, wltji 
.\ the aid of textbooks. Interviews with trainer«-«i2 

V piractltionersv ^ 
•if high school graduation exams, genetate content con- 
sistent with the i\urpose(s) of the tefijt J 



I 



-6- 



c. • Specify- the content in "descriptive" objectives (i.e., with 
sufficient specificity for people to* understand the content) 

Select the most appropriate obJect;^A^es for additional consideration 
i ^ Classroom ' 

•rel'evant groups (probably teachervS but possibly parents and 
administrators too) can meet to discuss the merits of dif- 
ferent objectives in delation to the purposeCs) of the test 
•consensus decision-making, the Delphi technique, and ques- 
tionnaires are three possible Xays of collecting dat* 
11. Large vScale Assessment* 
< " •decision-makers meet to select the content 

•supyeys of interested individuals (for example, parents, . 
tej^chers, principals, and students) can be carried out 
and the results are used the committee in making 
declsl ons about content • 
ma combination of 'the two methods can be initiated ' 
ill. Certification and Licensure ^ 

•survey job tiolders ar^d ask the^tq rate^job components in 

• terms of their Importance and frequency of occurrence 

• if high school graduation exatns, decision-makers can make , 
I a selection of content with the aid of survey data ^ 

(respondents can be asked to "rank" competencies, and 

indicate their r^i of importance) 

* • , * ' 

e» Validate the selection of content 

.3 i. Classroom 

•seek opinions of the test content* from teachers, parents, 

principals., etc. (if suggested revisions are. substantial, 

^' revise the content and repeat, this step) y 

11. Large S-cale Ass^essment * * ^ ' . 

•seek opinions of the test' content from teachers, ^Jarents, 

principals, and community leaders ^ 

ill.' Certification and Licensure , 

•determine 'the .matclji )[or degree of overlap) between tlie 

job specif Ication* and the content ' . 

' ^ #if hi^gh^school graduation exai^s, s^k dpinions of the teat \ 

, content from relevant decision-^makers'^ associations » etc, 

iv.. Make necessary revisions^ and/or additions to the content ^' 

Preparatrion of "domain' speqpLf Icatlons" . 

a/Org^iz§ the ^{alidated objectives in a useful wa^ (for example, they 
can oe organized around broad content categprles) , and prepare 
domain specifications (or some other type of device for oAarifying , 
the scope of content and format ,to assess perfptTnance on the ^ 
objectives) . / ^ 

b. petermltte which objectives can be combined by giving special 
attention to: * ^ 

iv^ test formai^ (objective vs. n5n--objec tlve) 
il, test environment (actual or simulation) 
Hi. personnel reqvfirements , 
iv, methods of scoring % 
V, materials. ne;eded and performance/ aids ^ 



R eview of domain specif Icat^l^na ^ ^ 

a. Identify reviewers and trfiin them In their task 

Af^sess the clarity, completeness (on the validated objeatives 
fi;om step 2 being measured),- choice Oif Item format, etc, of the 
domain specifications. 

♦ 

c. . Revise 'the domain specif Icatlons based on data from 4(b) 

i>na I tent p Ian 1 n£: < 

• , - . ^ . . ^ 

' a. Assess the feasibility of Including aM of the doma^Ln specifications 
^ in the test (consider the costs and tAne) . / 

b. If some must be eliminated, consider th^ ranking data collected at 
•step 2. Also, consider combining several of the less Important 

validated objectives Ifito one. 

c. With multiple domain specifications, tliere may be advantages, if 
simulations are to he involved, to connect them to one another 
via a common theftie or situation, 

d. State the "number of test items to measure e^ch domain speclfl- 
cation . ' 

e. Determine the number of test Ifem writers needed and plan- for 
having them complete their Work. '. 

Preparation of they"test content" (Do "4" or "b") 

... ^ ^ ^ 

a. .Non-objective format 

collect performance aids /obtain resources required by the 
domain specification 
11, give instructions to Item writers along wl-th a copy of. the, 
domain ^specif Ication •/ % ' 

ill, prepare test content, student and administrator directions, 
' aids, props, handouts, and set ^timfe limits (if necessary) 



b. Objective format ^ ^ ^ 

^ * ^Ive instructions to item writers atld indicate the number 
♦of items to be written 
11, prepare a draft set of test litems and edit them 
^ ill. prepare a draft set of directions for administrators and 
. examinees ^ • 

Preparationjftf a scoring method (Do "a^' or "b" a^ain) 

a. Non-objective foriiiat ^ ' * 

1, choose a scoring method from possibHities«gpecif led in 
each domain specif icAtlon ^ 
11, prepare scoring forms (usually both objective and non- 
f objective forms) for process, products, or both \ 

iil« * prepare detailed methods for using the scoring farms and 
training scorers (• 



\ b ^ Ob j ect ivc format 

^ • i. devejK)p scoring keys to reflect Item formats 

^(11. pjtepare mctfiods for acoring Itemd ^ 

8, yest materials review * . . 



a. Content spjeclallsts/ review test directions, content, and' scoring? 
, study Items for racial, ethnic, and sex bias J and provide suggcs- 
^ tlons for revision - 

b. Measurement specialists review the technical soundness of test 
methods (item quality, validity of scoring, layout, time likLtd", 
etc.) and prov4r4e suggestions for revision 

c. Make necessary revisions based on 8(a) and" (b*) 

d. Try out the test materl^s on a sample of examinees aimllar in 
characte ri^ti cs to the groups for whom the test is intended 

^e. Make revisions based on 8(d) and ass^^ss test* score reliability 

f» If revisions are extensive, repeat step 8(d) 

» - 

9 • Compilation of the final foniK(or forms) of the test . ^ 

a* Finalize the test directions .1 , 

b. Compile tTie final draft of test content (prepare parallel-foi^ms 
if necessa|;j^ ^ 

c. Finalize and state the scoring method ^ . ' 

d. Provide for test security (this step is not always ne<;essary) 

e. r Have representatives of minority groups study the Items for bias 

f. Desigd and carry ouj: an equating study (from one form to another) 

g. P.fep^re a practice test for a'dmini aeration prior to the test 

10 ♦ Determination of standards ^ . 

a* Form ai standard-setting* committee - ^ . 

^ b. Select a standard-setting method, train the committee in Its use 
and Implement it ' , ' 

c. Assess t^he reliability of the derived standards across members of 

the committee or acrbss ^'parallel" committees 

1^ 

^ d. Etesign and conduct a study to address the validity of decii^ions 
resulting *from the use of the standards 
. • * , • . . ...... ^ 

11 • Preparation of report forms 

a» Prepai;;e an Informative reporting form to contain all relevant 
information aoid which Is wr^ttfen in a style which will be 
me^^ningful to tK^>a^for whom the report*" is intended 

- r \ ,,.-„v,., : \ ; ^ 



I 



-9- 



b. Fonn a committee to review the material froth 11(a) and to make 
necessary revisions and extensions 

c* Finalize the report forms 
12 • Preparation of a t ec hnical niaaual ^ 

. i\. Administer the test^ to appropriate samples of examinees 

b. Assess the reliability of descriptions and decisions of all 
reporteti scores. With the judKHiental scoring formats ItT is also 

necessary to check the inter- rater and inter-^observer rellabilj^ty 
of both the objective- type and subjective-- type scorin{> criteria 

c. Assess' the construct validity of descriptions and decisions of 
all reported scoredc 

d* Compile norms^tables \lf desired) 

e. Reassess the cut*^off scores, related results (percent masters 
aftd non~masters) , and their implications and make modifications 

13. Publication of the teat - . ' ' » 

^ / ' « ^ - 

a. Finalize item layout find format 

b. Print the test, technical manual, along with report forms and an 
Interpretation guide 

c. Allow for dif feren^t cut-off scores in the reporting of results 
lA* Collection of technical data (over Clme) 

^ a.'^Plan to collect item statistics and test score reliability, 
validity, and norms information periodically 



10 



The reaults of this fitep should be written up and used as a guide 
by those who will actually construct the test. 

-\. 

2^ Identifi cation of Possible Content for Inclusion I n a Test 

The outcomevjpf this ^tep is a curriculum or job relevant test blue- 
print. Tlie precision of the blueprint should be t^.mpered by the importance 



attached tb the test scores* If a test is to be used to make important 
decisione such as certifying f)ilots or doctors > or granting high school diplomas 
meticulous care should be taken in determining test content. Carefully chosen 
individuals pr groups who have an interest in the testt who may be influenced 
by them, or who have coneot expertise should be represented In the 
process ♦ If a test is to be used to monitor classroom progress, then 
somewhat less effort should be expended here unless the curriculum Ijlll ^ 
be put in place across, a large number of schools • » . V 

First, a committee shou^^d be fomied to carry out the required 
work. For classroom tests this committee might include the teacher, but 
also perhaps other^ teachers and/or parents as well. For large, scale 
assessment, individuals with an interest in the test ishoul4 be involved^ * 
This might Include teachers., parents, administrators, community leaders, 
etc. For certification or licensure tests the committer would include ^ 
representatives from professional organizations and the government. 

The next task is to prepare an extensive list of possible content. 
This list can be quite long— even .hundreids of objectives. Byain- 

storming is a good technique for .generating a list becausfe no evaluation 

i . ^ 

of the desirability of Includlnf? any particular knowledge or skill is 

* 

to take place at this stage. After bralnsftormlng (or during it) the • 

# * 

lis b should be extended. v For classroom tests the ilst can be built 



from the present curriculum. Lists for large acale ^assessment projects 
should be drawn from available curricula apd textbooks but ideas should ^ 
also be solicited from all those who may have an interest in the test^ 
i^e., parents, citizens, educators, sqhool board(3), the business com-- ; 
munlty, union members, scholars and evoii students sliould be Purveyed 
fot additional test" content Ideas. For occupational/licen^u-re- tests an 
exhaustive job list shpuld be c^ravm from textbooks, curricula, trainers 
(teachers), practitioners, observational studies, and job analysis studies 

The elements which have been identified for possible Inclu&ion in 
the test should be put into "descriptive ol^jectlve!' form^ A descriptive, 
objective is used so that other people have a clearer picture of what is 
on the list, Ue,, what the.bbjectives mean, A descriptive objective has 

two Gomponenta-i (1) the behaviar of interest and. (2) a partial list' 

- - ■ ■ ' \ ■ " • 

of the component sk^ills of the behavior of interest. Two examples of 

closcrlptivo object ivo^i are given bel\^w: " * 

K .'descriptive Objective — Utl^liz^ the resources of a library T 

Component Skills. . • • 

y . ^ ■ . ■ \ ■ ^ * :^ ' , 

m Use a card catalogue * ' . • * 

# Use a reader's guide 

• US'C the reference section ^ ^. * ' ^ 

( "... * 

■ ' - . ■ . • 

2. Descriptive Objective — Maintain family finances 

/ Component Skills ^ 

4, . ■ «^ Baland«t checking account / , ' . 

' . * ♦ ■< 

. - • Create, a realistic budget . > 

»• • . ' ■ . ■ / » 

-^i- 4 * -. - \ . ^ * ' \ ' 

' ' ^ \. - ■ ' ■ . ' 

Even bel^tter,. althou^ it may be too time cotisuming if: the testing project 
J^s ^ small one* ls,.tHe.^pr^pai^ation of an "occupational analysla" (i.e. , 



I the specif icati6n of responsibilities, tasks, apd corresponding Icijow- 
ledge and skills which define an occupation) , An example far the 

'A 

occupation "test developer" is presented in Figure 2* 

After the possibl^g content has beeh extensively listed, the next 
step is to select the^* content which is appropriate for Inclusion in the 
test blueprint . If the test l^s for use in a single cfa§sroom, the 
^.teacher may be, the sol6 decision maker .but. other- teachers may help out. 

I 

Depending upon the importanc^ attached to the test parejnts and/o^ 
students might be of assistance as well. If the* test is for an entire 
grade then all interested teachers should be involved in the ptHDcesfy. 
At a meeting to discuss the test blueprint, decisions may be reached 
via some form of consensqs (or clos§ to it) or a group process, such as 
the Delphi technique. A questionnaire could be used particularly if 
parents are involved^ but if the number of participants is small thjtff^ 

procedure may be» unwarranted. >w If the test ik for a large scale assess- 

' . ' t" . • 

ment p^^ect theiv a survey oft the school and conraiunity should be undertaken 
The cotmnun^jty could be defined. as broadly aa the test is important; suffice 

it to say that. Interested citizenry and those people on whom the test 
has an effect should.be included in the proC€^sls. The survey should in*- 
volve a questionnaire which should be a lifting of the ei^tire IjLst of 

'desc^^^l|M|i.ve objectives. The respondents should be asked to determine 

. ■ *^ ■ ■ . ■ • 

> 

the crit^cality of e^ch behavior on some form af relative "importance 



Figure 2. / Example of an Occupational Analysis 



(Career Area) 
Education . 



(Job) 




Test Developer 

i 





T 



(Responsibilities) 
Exalgples 

1. Constructing knowledge tests ^ 

2. Constructing skills tests 

3. Conducting technical analysis 

of test scores 

4. Selecting instruments * 

5. Conducting test development 

workabops \, , 



(Job) i 
Building Prlnclp 


3l 


. 1 1 





\ <Responsibilities) 
E xamples 

1. Preparing building budgets 

2. Scheduling Ve*source uses> 

3. Maintaining student 

discipline 



(Job) 
School Teacher 



T 



(Responsibilities) 
Examples ' 

1, Maintaining class records^ 

2, ,Prov'lding instruction 

3, Communicating with parents 
A. Supervise extra-curricular 

activities 



ERIC 



14 



" (Tasks) . ' ' 

Examples 

!• Preparing test specifications, 
2. Writing test items ^ 
3^ Editing test items 

4. Piloting- test items ' . 

5. Assembling tests 



Examples 



(Knowledge and Skills) 



Defines item formats' (M-^C, T-F,.etc.) 
2.-K-2 Lists the chatacteristics of ^.well--. 

written multiple-choice test item 
2.-S.1 Able to write multlple--choice test 
items matched to^ objectives 



scale.- When a test is for use \n certifying occupational personnel or 
llcensln^^^rofessionals a similar ^rpcedu re the .surVey questlon- 

r naire listing objectives wl^iich asks resporftlents to judRe their relative 
importance) to that of large acal^ assessment sholkLd be us^d. In this 
case the respondents should primarily be practitioners, but fin adtute 

test develjOpG* may also want to include trainers and aonsumers in the * 

^ • . ._ . .» . _ _ 

'survey population. A l^ss desirable procedure (and^^ in fact^ a less 

• * ' ■ * 

acceptable method in ter^ of jucf;Lcial sc^rutiny) is one where trainers- 

meet to discuss the mep^^ts of one».i3bjective over anothcir. ^ 

4 in 

m -J^.*-. ... 

The .final ste]^. iri dev^eloping, a test blueprint; /Is to validate the 
selection of the content. At the classroom level the teacher (or | 
teachers) mayH^ant to have othar colleagues, pai:;ents and/or administrators 
inspect the tentative blueprint and m^ke Buggestions for improvement or ^(^ 
give their "stamp of approval"^o the test outline. If there are very 
many of these suggestions for imArov^jSien'^^thc blueprint, should go back 
to those who made i^ to begip witlxr If » when developing a classroom 
level test^ t^e content "validators^' ^re the same as the contfent 

y • \ - ■ 

"determiners" (a ptocedur^ which is not a particularly good one) then 
it is suggested that tj^e determining and validating procedures be done 
at least a few daysr apart from one another* For large scale assessment 
or certification and licensure tests it m;lght appear that the us;e of \^n 
extensive sur^,ey to determine test content automaticl^Hy proiluces \a valTd 

■ ■ ^ '7\ - - ■ } ' 

t?est blueprint Thi^fe is hot the case/. The results (including relative 
ranking) of the str^ey 'should be compiled int^ logical (or meaningful) 

categories and reviewed, Large-scale assessment projects should seek 

;. / * . ■ ■ • 



V • 

opinions concerning the cumptehcnslvenGSs , rcp^e5enL^lt ivene^ss and 

t> 

relevance of the tentatively selected objectives from teacherB, par.ents, 
administrators, scholars and communllty leaders. Tentative' blueprlaits 



which are to be used In teats for high school gradiiation should be y 

exanflnjod by rej^resentat ives' associated with those >^roups in society / 

. ' * ■ / 

wlilch are effected by the test. Tentative blivt^prints for cert 11 icatlon 

r 

or ii censure -tents should be reviewed by kj^lfccledgeabie. teachers and _ _ 

■ . • ■ / 

practitioners* Also, careful at tention^yiihould be paid to existing job 
descriptions to assure that there is reasonable line up between tt\e 



test blueprint and the occupation . .'/Necessary revisions or addl4;ions^to 
tl^e test blueprint should be made^Tased on the results of Che final ' 
reviews of the blueprint. In |ll cases, the committee which is ip 
cl\a>rge' of the testing {project); should monitor (and, likely; be involved) 

■i - ' , 

in all phases of developing the test blueprint. . ' 

* ' • / * . 

3. 'yreparatlon of^t>%omaln Specifications" ^ ' - \^ 

The outc^o^e of this step is k set of domain specifications (see ^ 
Popbam, 1978)/ The procedure is- exhAustive with respect to each validated . ^ 

objective. '' It Is important to note that \o question like, "Is it feasible y 



f-v . "V ■ . - .. . ■/ 

to test this?" or "Is this domain specification necessary?" should be ^ 

■ - . - • 

asked until is tep five. This step requires expansien of the desctiptlve 



obiectivee anto domain specif tcation?. 

\ v\\ ' ' . r ^ - 

Each'^validated objective niust fte Included in at least one domain // . 

specification. Validated objectives 'may appear in more than one ^oma4;f 

spe^9,lf ie*:eion. ^lis may occur as '*domaln specifications become broader 

and- cons^^uently„'can incliude more material. Also,' a validated ob^jjj^ctive 

may have -(both a knowledge cpraponeftt (which lends itself to paperi^nd- . 

■ 'i^ - <♦ . ■■. ' . •■■ -^^ ■ ,V . 7 



. ......... . . , , , . '/ 




pencil maa^.urement) and' a sklll^oinponeijt (which lends itself to'' per- 

* * 

'formance based measurement) « In addition to the ^tahdards ^ applied in 
the writing of domain specif ications/( for methods and examples, see 
Popham [1978] and Harableton & Eignor [1979a]) there are some other I 
. elements which n^ed to be cons^dered^ .Domain specifications should be 

/ 4<^ittpn for both objective ^paper and pencil) and non-objective (per- 

• ' ..-■-/ - • - - -■ - 

fbrmance based!) items. \x the domain specification Is f^'d performance 

I ' 1 / ' ^ ^• 

' "based testing then the environment for testings, personnel requirements » 

poffsi^le vjacoring <^^chnl;<iues , and materials and performance aids which 

are needed ^or t/he tast should be considered and i^icluded in the sped-- 

^ication. ^Two examples are offered in Appendix A.^ The first is*^ for 

performance in a "closed domain," i.e., the examinee has relatively 

limited parameters for acceptable performance. Other examples of closed 

performance are "filling out! an income tax form," "filling* out a job ^ 

application^"' "making a hosp:^tal bed" or "replacing a ^carburetor. " 

The second one is for performance in an "open domain," i^e*, the. examinee 

has a relative freedom In choosing a method of acbej^table performance. 

. Doma'^h ^specif ications In this area are nqo re difficult to.scojre but: 

. th^se difficulties ave manageable, Oth^r^ exam^es "^of open performance 

' are "leading a group," "handling office work, flow," %edslde manner,"/ 

"wr^lng a newspaper atticle^" etc. It Jj3 possible to construct: doj^ain , 

specifications in this area and these Important areas of hum/m en^e^vor 

need not be ignored.^ ^ 

f Appendix B Is a short introduction to the types 6f ^npf;>objrective 

■ ' ■ * 

test formats. This should prove tq be an Interesting sect^'bn to those 
who are inj^^ested in going beyond standard pat>er and ^ncil ol>^ective 
test formlits. 18 ^* 3- . ; 

^We note here however that the scoring sections of the dprn^a^-^ 
' f pacifications require mord work^ ■ ^ ^ < t ^ ^ 



Ao Review of Domain Specifications 

The pi:Noduct of this step Is a set of domain specifications of 
acceptable quality. The domain specifications which were constructed 
4n Step 3^ are reviewed for clarity and completeness. Also, the sample 
test items arfi rcf^rlewed to determine their appropriateness as indica- 
tors of thecoiltent or behaviors defined by the domain spetlf Ications* 
Finally, the domain specifications are compared to the te\st blueprint 

in order to be certain that the validated objectives are adequately 

« 

covered. t 

5. Additional T^st Planning ^ * * " ^ 

. ^ ' % , 

The outcome of this step is a ireduced set of domain specifications 

which will be'tised.to prepare the test. Three concerwrnshould be ^ ^ 

addressed: (1) determine which domain specifications have the most 

scopqwl^h^n practical limits; (2) determine which domain Specifications 

can <be> combined into a comaaori thread (or scenario) in order to ^'ititegrate 

the test and increase fidelity and representativeness; and (3) the number 

^ and type of ttems* As these thrfee pointy are considered it is Important 
to keep In mind (k) the purpose of the. test and resources available for 

, test^^ng derived ±A Step 1-,. and (b) the validated list of objectives 
derived in ^tep 



'In order to make these decisions the classroom teacher c^A decide 



^ f * ' ' •* 

solely or in conj\i^ction with othe^p who are intrereated In thq use of 

the t^st. Latfee-scale assessment endeavors ani OQCupatlonal/i)rofessional 

tefitlfig programs must rely on a group *process to make these decisions. 

The groups should include (again) all interested parties and aXmost^ 



J 



i 



4 - 



-18- ^ 



4 certainly /^wlll Include the committee overseeing the test development 
process ♦ Decisions concerning the number of ^ems tb be used in each 
domain and in the test should be carefully "considered in light o( the * 

^above concerns but also In order to appropriately maximize the validity 
of decisions arising from the use of the test. 



6. Preparation of the '^Test Content" . ^ 

The outcome of this step is a set of ^lest items drawn from the 
approved domain specifications. This sifep is split into two branches: 

7 \ 

^ (1) non-objective format — for perforaance-based .Jtems .designed to tap 

examinee skill, and (2) objective format — for paper and "pencil Items 
* «» ^ 

designed to tap examinee knowledge.-- Only the first brancti will be 
considered here; the second'^ds well known^ 

\ 

The first thing to do is to make sure that the • resources which 
are needed for the test situations \are available. Next, instructions 
should be givei^ to it^ra writers. The instructions consist pr'^ra^rlly . 
of the dodialn specif ications. but when constructing a situation the itj^in 



9 



writers will have to tend to other details. ^ In addition to writing 

directions and. items for the examiner and examinee, o^ther standardizing 

{ ^ • ' ■ " ^ ' - .{ 

aspects should beVarticulat^d, e.g., physical jconditions, perfebnn^T 

' * . ' : ; ~ •^-■y ■ f' • * 

requirement^, number of examinees to be tested simultaneously, spieclfy 

needed equipment (and its coridition)-^for both examinee and examiner, et 

. ■ * , ' ^ ■ / ^. , - 

U t 

Directions for the test administrator probably should: 

. ■ • ■ - ' 

. -l. Specify testing materials and recommend they be checked before 

y testing ^begins.. ^ ^ ^ . ' • ' ^ ' . ^ * 



20 



^ 2. Describe clearly what an administrator tiiliould do and say*\ 

* 

Occasionally, it is helpful if directions also mention 
what test administrators should not say and do* 
3*^ Provide an overview of thp testing process. 

4. Describe ways for the test administrator to introduce the test 
and put the examinfee at ease. • ^ 

5. Stress the importance of having prior training (or at least 
practice) in administering the test. 

Directions for the examinee probably should: 

1* Address the purpose of testing and why an examinee should perform 

to the best of his/her ability. Ap" 
2. Explain each 3tep in the testing process., 

I * * • 

3* Address time limits. <^ ^ ^ " . 1 

' . ^ \ . L " ' 

4. Explain the scoring system. 

. 5. Introduce performance (or job) al^« 

6. Explain the test environment and«the amount/ of realism 

/hich Is expected. - . ' 

In composing test Items, Item writ^te should adhere rather strictly|^^ 
to the domain specifricati?pns-at-hand arid strive to set 4jp situations that , 

are as realVliftljlce ^aa possibly within tVe aforemeat^ioned. constraint* 

. " * ■ * : ^ ■ , . ' ' , ' / / '0 \ ' 

f the testing pifogramv 1 ' • : ^ ^ 



^ ^. . -20- 



7> Preparation of a Scoring/ Method 

V 

The outcome of this step Is a method for scoring the test. Again 

/ ^ • ' ' : ^ ■ . ■ ' " 

we will note add res B the procedure one should use In scoring objective 

tests but ratlier we will focus on the scoring of non-objectlve tests, 

^Scoring ot^non-objective tests can take a variety of forms. Some ^ 

example formats for {icbrllng tests are presented in Appendix C. 

- - - - - - - • ^ - 

^t this stage the Item write* shguid choose from the scoring 

Possibilities articulated In thfe domain specification. Central to this 

decision should be^what scoring scheme yi^^l yi^ld the most, valid infer- 

matlon within the constraint^ of practicality When ^(Jeveloping a simu- 

latioA the item writer may suggest the degree of. precision required 

} - ' * ' 

C^r -^satisfactory performance (this shoui^ not^be confused with standard 

sfetting which is addressed in Step 10).. • /": . ^ ' ? 



Test ^late rivals Review . • . - * 

.The "result of this step is a group of items which are ready to be^ 
.complied Into the test, '] For classroom tests this ijtep needj not be elaborat 
hut It ^ould be thorough • Ml test items should be scrutirjize^d to ' , 

. • ■ \' . ' • • " / ■ ■ 

determine ut^gt tUey do in fact measure the domain specifications of 

■■ ■ " ■ 

interest and that, they do not include any technical flaws. For Nlarge-^ 

scale assessment and occupational/professional examinations, this step 




should be tfjeated in its entJLrety* The* Items which have beett written 

• V •■ ' ^ . I.'. 

and their attendant scoring 'procedureis should be reviewed by contEeui: 
specialists for content acceptability and scoring appropriateness and by 
measurement specialists for technical acceptability and scoring^pw^^ 
prlateness, PosslJfl^ f flctins j^or reviewing test items and scoring 



J' 



^ ' > -21- 



-.21- 

* ' ' • 

*mon-o|b^'ective test items are presented in Appendix D. Based on the 

/ ' ' ' • 

. restjj.ts of '•these reviews. Items should be lef t intaCt (if acCepted>, 



diecftrded (if hopeiess). or x^viMd (if possible). Xhe revised items 

- * * ' ' ' ■ ■ 

• ^ould" t^l>«%^ be subjec^d to review ajjain. .'. 

Nex£, the items sHqilld be subjected to a pilot test. Careful 

f . ■ - ' ■ \ ' - , ^ 

attention s^ould be paid x.o all aspects of the testing situation. Areas 
which should be adfires^^lt in the -pilot are item statistics (see Popham, " 
^ 1978; \lairf>leton «e^i?al. , 1978), clatity of directions, rea^bility of 

th^ test- ItemSj^ speededno^s, itemCias, etc* Revieweirs should also clieck 

• to make sure t)iat tlie' non-objective scoring procedures are articula(;ed 

y * ^ ' \ . 

well and .are working properly ^(±.e.^ leading to reliable and valid scores), 
'Also, the scoring chcfice (from Step -7) should be reconsiderea* On the 
'basis, of the pilot testv items should again be ei^er left intact, dis-- ^ 

carded or revised. ' - ' 



9, ^^bmpilationjof the Final Form (or Forms) of the Teslr^' 

. ' ^ - / ■ ' ^ ^ — ■ — ' — • ' — — ■ ^ ' 

The outcome of this step, is the test in its final form. This entail 
firtSftl editing of the" test directions, compiling the itwems. into the test and 
carefully delirieating performance aids. In addition, some final decisions^ 
have to be made about the ways in which the test will be scored. In the 
cale* of objective tests this procedure Is usually rathy^r straightforward*' 
(although discussions about the relative weighting of true-false and 
multiple-cholCe items often produce lively debates).. Decisions about non-- 
objective scoring procedures are difficult and important. A committee 
consisting of both content and measurement specialists should meet to 



determine which scoring procedure© are mpst relevant to the task yet are 
* p^ychometrioally sound. These discussions can best take place In light 4 
of the pilot teqt results. Once the decisions are finalized^ directions 
for scoring and the finalized scoring forms can be compiled into the test. 

It may be necessary to consider providing for test security. De- 
pending upon the situation in which the test may be used this may or may 
not be necessary. ' ^ ' 

If there are parallel forms it will probably, be necessary to design 
and Implement an equating study. ^ 

* " / ^ 

10. Deteriiiinatlon of Standards ^ - * 

' " ' ■ - — - — — — ■ " — • \ . 

The matter of ptandard-aetting Is a difficult One to deal with. > 
It is cledr that all stan^rd-setting methods are judgmental and arbi- 

trary; However, as Popham (1978) correctly pointed out, arbitrary 

^ * ■ , • • ,^.** 

_ ij. • 

standards are nok b^d or undesirable if by arbitrary It. is meant that" a 
clearly developed plan for standard-setting was prepared, • critiqued, and imp le 
Imented. Readers are referred to Harableton and Eignor (1979a, 1979b> 
for two reviews of the standard-setting literature and .other re^ferences 
are provided there as well. . • 



11. Preparation of Report Forms 

The outcome of this step^ is a reporting system which meets the 

' . .' ' 

needs ot those with an interesit in the test. A representative committee 

might meet tou.4et^rmlh^ the form and content of the reports, but this is 

not .absolutely necessary* It is. possible^^. elicit the desires of the , 

♦ 

various groups 1^ separate meetings^ interviews or .questionnaires. After 



0 

an initial draft is made the report form should be reviewec^ by the com- 
mittee^, fit would be most helpful if sample intormation were provided 
in the form. After revision the form^ should be finalized and made ready 
for \»ubllcAtion. It is unlikely that the committee would have to' review 
the revisions. 

This step has had a histoijy of neglect. When ajii is said and done 

any test Is not worth any more than the information derived^ and conveyed 

from it. Careful, even meticulous, attention to. this step can have* big 

pay-offs in terms of the usefulness of the test. The reader is referred 

to Mills and Hambleton (1980) for a thotough and Informative presentatiouj 

■ * ^ 

of how to report test scores. 

12. Preparation of a>TechAlcal Manjjal 

The well-known APA/AERA/NdME Standards for Educational and 

\ ' 

Psychological Tests published by ythe American Psychologioal Association 
in 1974 p^tovldes a complete set oi^ guidelines for preparing technical 
manuals; It suffices to say here that a good test inanual should fully 
describe the test development and normiug proceg^, 'test administration 
directions, and reliability and validity information in relation to 
each of the possible uses of scores deriyed from the test. 



13. fubllcation the Test 

Thjp outcomfe of this step is the finalized version of the test, 
admlnistratora ^nanual, technical manual^ report forms ^ and performance 
aids (if appropriate) • While this may seem to a rather^ straight-^ 
forvard step the int?rei5te^ reader should see Thomdlke's (1971) 
article on this vissue. . " r; ^ ^ 

If the test is for wide-scale use we suggest that the usefulness 

of. various cut-off scores be reported in the final* versiqn. This may 

t 

greatly enhance the usefulness of the tept £or dij^ferenr Ibcales/ 

> . . ■ . 

14. ' Collection of Technical Data (Over Tim6) . . 

Regardless ,of the strengths of testing program in a particular 

--. * ■ 0 , . 

situation at a given point in time, curricula change, and so do expecta*- 

* • * ' * 

tions for high scliool graduation, for entry-level Into a profession, -^job 
characteristics, and the types of people who are in programs, etc* This 
mea^s that the psychometric properties of tests will not remain^ static. 
Periodic reassessment of test score reliability and validity i,s essential 
And, to'paraphrase Bob Linn, norms unlike wine do not imp to ve with age 
and so norms tables must be updated periodically. 



" Concluslcyna an4 Suggestions for Research 

1 • 

In this pa|}er a comprehensive model fbr building and validating 
criterion- referenced teats was introduced* The model Is not In final 
form at this time, but we do feel ,it can he helpful to test developers 
in sequencing their activities. We feel equally positive about our ^ 
support for the use of non- objective formats.- Considerable re&earch and 
development work has been done in industry the military with these - 
formats. Similar work should- be done in education. The formats have 
much to offer in the uay of enhancing the validity of test scores and 

elated decisions. . 

f» • ■ 

Additional researcl^ should take several directions. First, there 
Is -cfwslderalble need to substantiate the test development and validation 
modtel.* This might be constructively d^ne by having test developers 
^ (1) checkAhe model for completeness and clarlt^ and (2) match it to 

' J 

the way i.^ which they go about their work (or would if they pould choose 

*~ * ' 

I ' 
an approach) . Gaps and ambiguities in the model can be identified and 

used as a basis for making model revisions. Sexjond, there is a need to 

/ . J. - ' .. ■ 

I go beyond the model and provide detailed methods and procedures for 

carrying out.^Bitch of the fourteen steps Without methods and procedures 

there is not an effective way for applying the models Finally, more 

examples of domain specif icatioi\s in many content areas, like the two in 

Appendix B, are needed.^ ^ y 

Hopefully ^ some of the id^as ^^d material presented in this paper 

> ' ' ^ : ^ 

will encourage pthera to extendi and improve upon our work. We hope 

so tiecaup^ much work remains to be done and^ the potential for improving 

the uaef ulii^ss of criterion- referenced tests Is-substantlal* \ 



-26- 



References 



Berk, R, (Ed,) Crlter4pn~ referenced measurement: State of the art ^ 
Baltlmj)rje/ MDi^^ Hopkins Press, 1980, - 

Fitzpatrlck, R, , & Morrison, tT. Performance and^product evaluation. 
In R. L. Thomdike (Ed,) , Educational measurement , (2nd pd.) 
Washington, D,C.: American Council on Education, 1971. . 

Fr^deriksen, N. Proficiency tests for training evaluation. In R. Glaser 
(Ed. ),. Training research and education . New_York: Wiley, ^1965 • 

Ci^iser, R, Instructional technology and the' measuremetit of learning out- 
comes^ American PsycholoRist ^ 1963, 18, 519-521^ 

Hambleton, R. K. , & Eignor, D. R. A practitioner's gui^e to criterion- 
referenced test development, validation, and fest score usage^ 
• Laboratory of Psychometric an^i Evaluative Research Report No. 70 . 
(2nd ed.) AmhersjT, MA: School of Education, University of 
Massachusetts, 19791 (a) 

Hambleton, R* , & Eignor, R. Competency test development, valida>- 
tloii,,and standard setting. In R. Jaeger & Tittle (Eds •) , 

^ Mj^nimum competency achievement testing , Berkeley, CA: McCutchan 

Publishlftg Co. , 1979* (b) * 

Hambleton, R. K. , Swaminathan, , Algina, J,, & Coulaon, D. B. CritJrlon- 
referenced testing and tHeasurement : A review of techoical isAies 
and developments. Review ^of Educational Research , 1978, 48, l:-47*-^ 

McKeegan, H. F. Applied performance testing: What ls\t? Why use it? 
Portland, OR: Clei 
Northwest Regional 



Portland, OR: Cleamlnghouse/if or Applied Performance Testing, 

mal Laboratory, Paper #1, undated. f 



Mlllman, J. Criterion-- referenced measurement. In JW^ J. Popham (Ed,), 
Evaluation In education: Current applicat^cjps . Berkeley, CA: 
MpCutchan Publishing Co,, ^974. \ 

Mills, C. , & Hambleton, R* K, Guidelines for reporting, criterion-referenced 

test score information. Laboratory of Psychometric and Evaluative ^ 
Research Jteport No. 100 . Amherst, MA: School of Education, 
University of Massachusetts, 1979. 

i 

Osborne, W. C. Developing performance tests for training evaluation , 
Alexandria, VA: Human Resources Research Organliratlon, .Hum 
RRO--PP^ 3*- 73, February 1973. . ^ ^ 

t^nltz^-A., & Olivo, C. T. Handbook for developing and administering 

occupational coropetei;icy testing s Washington; D.C.: U,S. Depart- 
ment of Healthf Education and "Welfare, Office of Education, National 
Center for Educational Research and Development, National Occupa- 
■ tlonal Compe^ricy Testing Project, Research Project //8-,04M, 1971, 

V . . ; ■ . ..38- . - ■ ■ ■ ■ ' . ■ 



Popham, W. J. Ci^tcr lon-ref erencod measurement 
Prentice-Hall, ^Inc-, 1978. 



Englewood Cliffs, NJ 



Popham, W. J,^ & Husek, R. Implications of criterion-referenced 

measurement. Journal of Educational* Measurement , 196^, 6^, 1-9, 

m 

Sanders, J* R*, & Sachse, T. P, Problems and potentials, of applied 

performan<?e testing •v^ Proceedings of the National Conference on 
the Future of Applied Performance Testing. Portland, OR: 
Northwest Regional Educational Laboratory, 1975) 

\ 

Thorndlke, R. L. Reproducing the test* In R* L. Thorndike (Ed.), * 

Educational measurement. (2nd ad.) Washington, D.C, : Americaa 



Council on Education, 1971. 



Xii^kelman, S. N. ^ Plani 

Educational measur 



- Council on> Education ^ 




jective 4:est. In R- L; Thorndike (Ed-)» 
(2t>d ed.) Washington^ D.C.: American 



V 



0.. 



■ 0 • 
£RJC 



29 



» 



^ Appjendix^A 

.A * ^ 

S^ple Domain Si$%clf Ic^litions 

(1) Wa:^tlng Checks ^o^ Specified Amounts 

(2) Utilizing the Resources of a Library 



* if 



;.\ ;V.* 



Objective 



Student is able to write checks for specified amounts and to recor/c 



/ 



and balance the transactions on check registers. 



Level 



Senlpr High School 



Sample DlrectlonB for Perfommnce > ^ 

You have a new checking account at a bank. The checks and register 
have Just arrived in the mail- With the checks it is now possible to pay 
a few bills which require payment* The checking account was opened with 
a deposit of $525.90. The bills to be paid'aife: 



(1) Bank Plastics, Inc. 


$75.40 


(2) Martha's Gas Co. " 


$12. 3D 


(3) Mortimer J. Snerd 


$275^00 


(4) Undermoun^tain Utilities 

... . , '-. 


$27.53 

w , 



\ 



"" You should i^ay the^e bills by writing checks, and recording and 
balancing each, transaction in^the check register. The checks need not 
be mailed; just give them to the proctor along with the register when 
you are finished. ^ t 

You have fifteen minutes to complete the task. 



Content /Behavior Jjomaln < t 



. . Tjhe examine^ will 
than five checks- 




asked to writ/& at l^aat , three and hot more 



-A2- 



2, ^ The beginning balance will be given as an amount between $100.00 

and $999.99, 

4 

3, The checking accounywlll be "new>** -1 .e, , LheVe'wiy be no checks 
already on the reglstef ♦ 

A. The examinee will give the completed, checks and .the register to the 

proctor when finished. 
5. Tliere is no restriction on the subtraction probli^ms involved^ i.e., 
the examineQ will be expected to borrow (as a subtraction procedure 
subtract cents and dollars, and keep the decimal point where it 



belongs. ^ * ^ 



► . \h< 



6. xhe checks would be written to fictitious companies or IndiAiNl duals , 
?• The examinee will" not be asked to ovex^draw on the account. 

'I. 

Performance Aids and Environment ^ 

~ ^ 'V ^ 

1. The ejtaminee'will hfe given a check register form wlili no previous 
entries. 7 ' ^ 

2. The examinee will be given double the amount of blank checks which 
are needed to pay the bills. (Thi$ is iwi case certain checks must' 
be voided*) / 

3. A pen is necessary.- 

\ - - • . ■ ' ' ■ • ' ■ > 

4. The checks should be authentic checks, 

5. The checks should be seriate^: ('pre--numbered) . 



6. Check regis terf^ ^J^^hlch ua^ stubs sihould not be used* 

' ■ . V- 

7. ^ the environment should be a quiet,, unhurried one. 

8. The workspace- should hk acjequate.i 

9. Calculators are not al^lb^pd^ 

10. A blank piece of paper^ pillowed;. 



, -A3- 

Scoring 

Objective Criteria 

A recommended scoring key for the performance task >^ollows: 

\ 

(a) Accuracy * 
The check 
1, Correct date. S » 



1. Name of payee In the proper space. ^ C 

3. Numerical amount in the proper space. 

4. Numerical amount is the correct amou/t. 

5. Numerical amount written cdrrectly in numbers* 
i.e., 5i.27. \ 



t 



6» Numerical amount Crritten correctly in words. * 

7. Signature in proper place. 

8. Proper name signed to check. (Middle name 

may be deleted or abbreviated.) ^ ^ 

9. Reason for pheck noted in "memo" . section, (optional) 
The register 

10* Transaction, entered on i;;egi8ter <^ . 

a. check nuiober 

h. date 1 
X . c. payable to 

\ I . ■ • :\ ' 

d. correct amount 

■ ^ - /' • , 

e. amount in. correct column 

*" f . amount cortectiy deducted ^rom prior balance 

."■.!•■ . ' ■ ■■ 

(B) Time 1 * . 

X, Task completedj in allotted time. . - 

2, If lees than 'aj-lot ted time — total elapsed time., 



Subjective Criteria 

(A) Rating scales ^lace'a In the appropriate column) ^ . 

Unacceptable Acceptable 

1, Handwriting Is legible, 

$ 2. Numbers are clear, ^ 

3, Signature Is executed In a consistent manner, » 

♦ A, Register Is kept orderly, 



5. Regis ter^ Is legible. 



A student Is Identified as a "master" of this sklJLl If his/her performance 
on tha Objective Criteria Is 100% (excluding //9) and 100% of the subjective 



ratings are In the "acceptable" category. 



Objective 

The student is able to the resources of a llbi:ary^jux, gather 

/ 

material for preparing reports on selected topics. 



Level 

Senior High Schpol 



Sample Student Directions 

You have been assigned the topic of 'Vhales and Their Struggle 

for-Survival To complete the assignment you must find source material 

/ 

in the library in order to write about the topic* The details of your 

task are as follows: . h . « 

' -* 

• You have two (2) hours to gather material. 

• You have the entiVe library at your disposal* ' 

, You should select the material you need and check it out 

^ according to library procedures* No more than eight (8) 

vi* • . . . ■ 

^ items may be checked out* , ^ . 

• Reference books may not; be checked out so if you want to 



get information ""from them^ then you mixsx! take notes and 
bring the notes out of the library^ ' ^ v ^ 

• You are not alloWed to photocopy material • 

• You may not ask tlie^ibrarian questions during the assignment. 
You uill.be observed during this task and may be askfed questions by 

• ' L 

the observer concentlng your activities. At the end of |lw two hours 

you will be asked to two things: , 

_ • ' ■ . \- . • ' ' . 

:/-\ • Give your notes and/ the material you have checked out to the 

observQx:,^ • ♦ * •' " * ; 



-A6- 



• Write a brief explanation of why you chose the particular 

r 

materials that you did. 



.J 



Content /Behavloif Domain 




1. The examinee will be assigned a topic that Is of general Interest 
and for which there Is material available tn books, journals,* news- 
papers, and reference books. Examples of topics are "Whales and - 
"Their Struggle for Survival. "The Design and Safety Features 
Modem Airplanes," "The Career of ^ Henry Aaron,'* and "History of thi 

Olympics*" * • . 

♦ . ■ 

2» The examinee must Rave borrowing privileges so that material can 
he checked out and evaluated. 

3. ' After checking out the material (at the end of two hours), the | 

observer will ask the e^camlnee to write a brief rationale f or^ the | 
selection of each piece of material. Preparing rationale statements 
should require an additional ten to twenty minutes. 

4. The e^xamlnBe .will be allowed to u^e the entire library to" locate 
material. ^ 

V 

■ I ■ ♦ 

5. The examinee will be told: , *^ . ' _ \ 

• that note-taking is acceptable, ^ . ^ , 

• to locate material for in writing a report on the assl^ed topic, 

• of the presence ^ observer, 

• that qjuiestions will be asked concerning their activities. 

6* The observer will colle^ct the notes and the material which were checked 
.out at the end of . the two hours. 




Ferformanco AldB and Environment 

1. A library of suitable size Is to be used. School libraries with more 
than 10,000 volumes would nopwilly be acceptable. *^ 

2. The lll;^ary should have^ubatantial information on the selected topic 
(•^Su^tantial" means that there is enough material In the library so 
that someone who possessed the skill could collect enough material 

to prepare the -desired report.) ,■ 

3. The eawminee must hs^e (at least temporary) borrowing privileges. 
The material which has been checked out may be returned within a 
half hour after the test in order to allow the next group of 
examinec(8^ access to the same material. 

A. The examinee should have a notebook and pencil or pen. 
5. The observer should be unobtrusive as possible but may interrupt 
foV ^^'^^f periods in order to assess examinee performance. 



Scoring (Several possibilities are, given) 

, . Objective Criteria 

V r 

(A) Time (expressed in minutes) ' 
1. Time used (start to finish) 

# Amount of time used in locating material. 



# Amount of tlme^used taking notes. ' ^ . ^ 

• Amount of time off task, e.g. ^X^at^room^ . 

talking to friends ^ etc. - . - _ 



■ ■ ■- -A^- 

V 

(B) Accuracy . 

!• Locates material from classification numbets In the card 
catalogue. (number) 

I 

\ ' • citations 

• "finds" 

N/A (checked out missing^ etc) ^ 

# citations checked out * " ~ 
2. Goes to correct place of It^ms. (check one) 



directly one two > two gives 



error errors errora^ up 



(C) Accomplishments (number) 

• Items checked ^x^ut of the library 

• pages of notes taken 

• citations (or at Jfeast classification 
numbers) written / down 




:o|f use^d) 



• items^^p^rused (oW used), but not checked out 



(b) Effort 

!• Number of steps /taken [as measured by a pedometer] 

i Subjective Criteria 

(A) Self-Rating Scales 

1. Ease^of the task / 

2. Suitability of selected material 



38 



(B) Observer Rating Scalea 

Rate the candidate (by placing -a at the appropriate spot) 
each scale. 



1. Rationale statements /or materials.- 

'I h— +— 

totally ^ 

unacceptable 



highly 
acceptable 



2. Relevancy of the materials. 



totally 
irrelbvant 



highly 
relevant 



3* Diversity of materials. 



low 



high 



I 



Appendix B 
Types of Non-Objective Teats 



40 



\ 



• I 



^ Types of Non-Objective Tests 

The purpose of this, section Is to describe several non-objective 

est formats. We will not attempt an exhaustive categorization process 



but we will provide a framework^ some cqmrnon terminology and descriptions. 

^ Frederlksen (1965) lists seven methods of obtaining measures 

for use in assessing exaiftinee performance: ' 

Solicit opinions' — This can T>e accomplished formally or informally/ 
Examinees (or individuals who know them) can be asked to provide 
ratings of performance. ^ ^ ^ 

2. Administei^ attitude scales — When the content of « the scale is 
relevant to the behaviors of interest, the two measures should* 



\ 



be (at least) moderately related. 



3. Measure knowledge — This can be done via the development of a 
paper-and-pencil test* It is not usually sound to assume that 

0 knowledge of facts and principles is closely related -to skill . 
in performing a ta'Sk. / , \ 

4. Elicit related behavior— An example of this would be to have a 
dtudent edit .or^ rewrite writing* samples as a test of English 
coiqposltlon ability* ^ 

■ 

5* Elicit "what I would do" behavior — A common problem with this 
approach is that real-life problems gerier*ally don't present 
themselves in a multiple-choice format, or, at least one which 
is presented with insufficient information; 

6. Elicit lifelike behavior — This JLnvolves using a simulation or 
at least a situation that is set--up by the test developer. 



1. Observe real-life behayior— This is impossible to standardize. 
Often real-life behavior is used as a crigferion for exapiinee 
success /unsuccess (e.g«, supervisor r:at Ja||^) * Caution Is - 
warranted due to the fact that^many intervening and uncontrol- 
lable, variables may enter into the situation. 



Objective test formats are commonly used to assess knowledge (method 3) 
whereas non-^objective test formats can be used to assess skills (methods 



4 and 6)« 



A 



41 



(/ 

V 



Panitz and Ollvo (1971) and T^itzpatrick and Morrison (1971) , among 

o.therst supply a scheme fo^ categorizing tests using non--objective formats. 

Tab Jie B.l provides information for comparing four types of tests: recog- 

nltlon tests, ^simulation tests, work-sample tests, and project/products 

tests. What follows is a brief descriptiou of each: 

1. Recognition Tests — This is sometimes* callec^ an "identification 
test" and measures the examinee's skill in recognizing the 

• essential characteristics of a procesa or product by naming the 

object, describing the operation, airf9/or delineating the func- 
tion. For example, a telephone repair person could be presented . 
wj,th a picture of a telephone set-up and be asked ifl the system 
was set-up correctly. A diesel mechanic could be asked to 
identif^ the parts of an engine and their function and could 
even be asked to do it in a pre-specif led ' order . We can include 
in this category certain problem-solving tests. For example, a 
licensure test for medicine could present the examinee with a 
medical histoi^r and the results of certain diagnostic tests. 
The examinee piay be asked to interpret the findings and. present 
possible treatment or recommend further testing. 

* Identification tests can be g^iven orally, in writing or even by 
computer. Careful attention should be given to sampling a^ variety 
of repr^entative tasks from the t^st blueprint. The scoring of 
these test8\ should be objective and should clearly differentiate 
mastery /non-masfery proficiency. These te'sts have the advantage \ 
of being resonably easy to construct, administer and score, but 
do nol( readily measure Frederiksen's category number six: elicit 
lifelike behavior. 

.2. Simulation Tests — In simulation tests an examinee carries out 
realistic tasks in a setting which simulates a real situation. 
Role-playing is often an essential 'Ingredient of a simulation. 
For example, a "psychologist" (examinee) may be asked' to treat 
a "client." A managerial trainee is confronted with an "in- 
basket" on his/her desk and be asked to respond to a variety of 
plausible problems. Computer, or other, "games" which present 
interactive problems to "generals," "economists," "managers," 
etc#, can be grouped within tlils category of testing. Simulations/^ 
are often used when the situation is too large (e.g., economics) 
or amorphous (e.g., management) to lend themselves to be readily . 
measured. An even more compelling use of simulations occurs 
when the job presents a health or safety hazard. Airline pilot 
training makes extensive ^use of simulations as <|oes the training 
of astronauts.. The health professions are increasingly utilizing 
simulations of clinical conditions. Programs which train people 
in dangerous professions, e.g. , ship's captain, workers who deal 
with high voltage electricity^ etc.^ frequently utilize sliaulations* 



4 



I 



Table B.l Types of Tests 



1 — — ^ — . — . ^ . 

Characteristic 


Recoguitlon 
Tests 


Simulation 
Tests 


Work-Sample 
•xTcsts ' 


Prx)ject/Prouuct 
Tests 


Useful Sit\iatlon9 for 
Application 

to 


1. large groups of 

exajiilncos 

2. economy is 

liDportant 


1. "where the situa- 

tion is too large 
and amorphous to 
have "real" 
situation 

2. factors under con- 

sideration must 
be limited 

3. where health or 
^ ' Jfafoty is a 

factor 


1, when on-the-job 
observation is 
posslble 

^. where the work in^ 
question can^te f 
accurately ob- 
served 

3^ primarily used with 
skilled or semir 
skllli»d worker^ 


1. where process is 

not itftportant 

2. where a variety 

of processes 
are acceotable 

3. when test devel- 

opment and ad- 
ministration 
V costs are 
. limited 


Examples 

• 


^ — ... ^ — 

^ 1. identify parts 
of a diagram 

2. point to speci- 
fied c^omponents 

3* identify func- 
tions of various 


1. role playing 

2. games (computer 

& otherwise) 
3- In-t^basket 
4. Secretarial tests 


1. -troubleshoot and 

repair 

2. production out-- 

put> e,g., 
machinist, 
secretary 


1. artistic pro- \ 
^' jecte 

2. sports contests 
3- science fairs 


* Validity for 
Deter^mining ^ 
Proficiency 


— .m 

V 

1, low for skills 

2. high for know- 
N^edge 


1, moderate/high 

for skills , 

2. moderate/lowu^f or 

knowledge 


1. high for skills 

2. moderate for 

kpoWledgje - 

ft 


1. ' moderate/high 

for skills 

2. moderate for 

knowledge 


Response Modes 

: . / 


1 • — — — ■ - • ■ — — — ^ •— ■ ' ■ ■■— 

1* paper dnd pencil 
. -multiple choice 
-fill In blank 

2. ov^l 

3. computer 'inter-* 

action 


• 1, paper and pencil 

2. computer inter-- 

. action j 

3, oral 

'A. manipulative 
*• • 


Varies J depends on 
actual job^ re- 
quirements ^ 


response is the 
product 



I 

tip 



ERIC 



43 



BEST COPY AVAILABLE 

■ V 



44 



< _^ : I — -i— — 


1 

- ••- - • 

Characteristic 


-Kecoijni tion 

AiStS 


Siwula-tion ^ 

Tests 


.WQ.r.k-Sample._ ] 

Tests 


. -P j ^ / r^-*;^ L . . 

Tests 


Scorin^^ Modes 

.* 
» 


1.^ objective - x 

4 


\. objective^ e,g- , 
did/did not do 

2. subjOctive^ e.g» , 
observer rat ings 


1* objective, e.g. . 
output, waste, 
accuracy, etc. 
2* subjective , e •g. , 

.rating scales, 
^ .ranking, etc. 


1. objective, e.g. , 
measure toler- 
ances, product 
works/^ does not 
work, amount 
completed, etc . 

2^. subjective, e.g., 
artistic merit 


Process/Product 
Evaluation 


docs not apply 


process and/or 
product . ^ 


process and . 
produpt^ 


product 


Co?ts 

X » 

<1 • ■ * 


relatively ii^Q|t'"* " ' 
pensive to 
develop , admin- ^ 
istj?r and score 


expensive to develop, 
administer and 
score 


expensive to develop 
costs vary to admin- 
ister (crften it is 
on--the--job time)- 
"Tf* 1 t* i VP 1 V pvnonsivG 
to score 


inexpensive to 

develop 
costs vary to ad-. 

minister ^ 
costs vary to score w . 


Fidelity 


, . 

low , 


— , — . — ^ — ■ — 1 ■ 
nisn 


nxgii 


moderate/high 

* ' . ■■■ ♦ ■ 


Useful as 
Instructional Device 


yes 


yes 




yes ^ * 

\ • * 


Comments 




— — n ^ 

The test^ con- 
structor must 
strive for maximum 
fidelity within 
allotted resources 


^. s . * ^ . 


4. 

■ J 


• ■ 

.[ 45 ■ 


. 'v. • ■. ' ■ ■ ■ . . 

: ■ ■ ■ . - - / 

^ ^ ... -,11 . ■ . ■ . . 



Sltaulatlons often entail a variety of responBe modes:- paper- 
and-pencll, oral, computer, manipulative, etc* This can present 
difficult scoring problcma. Also, caution is warranted in ^ 
asBuming a degree of*^ relationship between simulated performance 
and performance with actual equipment and people under r-eallstlc 
, conditions* Finally^ careful attention should be paid to which 
tasks are simulated. An effective task analysis may alleviate 
many difficulties with respect to test validity but a sample of 
isolated tasks or series of tasks mtay not be a valid sample of 
the total job situation. 

3. Work Sample Test — Wlille these tests may appear in some ways to )be 
^ aimHar to simulations the asseutial difference is that it requ^h;es 

that the individual demonstrate proficiency by doing a series of 
tasks or completing a piece of work under actilal work conditions. This 
is the most "realistic" type of test available and has the highest 
face validity. For practical purposes the t^st often consists of 
a sample of a Job. For example, it may not be feasible for a T.V, 
technician to rebuild an entire set so we may observe her/his 
^ troubleshooting' and repair skills. Work sample tests have prim-- 

arily been used in the past with semi-skilled or skilled workers ♦ 
We see little reason , however, for limiting ^eir application. 

« 

It is difficult to standardize this type of test but it is riot an 
impossible undertaking. Wlien the sample of work is an appropriate 
one these tests can provide reliable and valid estimates of pro- 
. ficiency. ^ 

4. Project/Product Tests — This type of examinatiori entails evaluation^, 
of only the result of a lyeries of tasks* Something is presented 
and evaluated*' Science fairs, musical or dramatic performances, 
most athletic competition, art shows, industrial arts projects, 
etc., are only a few of the types of activities which readily 

* lend themselves to this type of evaluation* Evaluating only the 
finished product ignores adequately assessing process and. examinee 
knowledge, but nevertheless this type of test is often quite 
useful and generally very economical. ' ^ 

All four types of tests described above have considerable potential 

for ci:iterion--referenced teat developers* . 



4? 



Exqmple Fornmtfl for Scoring Non-Objective Tests 
The scales which are delineated be^l^ are suggestive of the types 
of scoring formats which arc available. Scoring Is an linportant^,,-<rtSn-- 
^^^slderatloi} and a difficult responsibility for* the test constructor- 
Depending on the scope of the skill (s) to be evaluated It is unlikely 



that only one format will adequately measure cfxamlnee proficiency* 
When designing a test, the test <ion&t rue tor shouldv per use this List 
to see which scoring procedures can adequately be ^ed to assess pro-^ 
ficiency. 

The first five types of procedures are relatively moi;e objective 
than ^the following five types. Fortunately > there are at least three 
promising method^ for increasing the reliability of assessments: 

1. Use several Indicators ^r measures) of performance, 

2. Increase the,\number/of ' skills to be Measured, 
Thoroughly triJuTu re-train) ^ob||erver8/scoreW. 



V 



Objective Mef^surement. 
1. Time 



This is a measure dealing with the amount of time which an Examinee'* 
usea In demonstrating a skill. 



/ 



^ 



-C2- 



Ivxamplo: 

Time started 



Time finished 

w - ^ 

Elapsed time 




1 . Accuracy 



These are measures which dei^iA ^^^i^A ^^ll.^- ^9 PT^^Mf-? 9?. 

pr\)qea8. » . • y 

Example 2.1: / * v 



rodm: 

V 



Humher of typing errors on a ten-minute test; 



Is the stock cut tc desired length (+ .01 inch)?" / 

# ' - / 

Is the blueprint prepared according to the specif icatio^ns? 



ixample 2.2: 



/ 

Objective: Wooden bookshelf cut to 1/16" accuracy., j 
Scale: Wood cut at two feet. 

1..' 1 — -h — r ■ .1// \ — h — \~ 

I'll^A" I'll -^6" ' 1'11V8" inr7l6" 2' 2'1//16" 2'VB" 2 •3/16" 2^4" . 

II- 

... / / ^ 

The 6^s^icvBlt is told to measure the wood and r^ytord the dimension. 

The scoring could be 10 pts. for I'ljl5/i6" -/2''^16", 8 pts. for the • 

( ^ / ■' ^ . 

next /16" from perfect, 6 pts. for the next -'•/lb" and 0 pts* 



beyond that. 




50 



* Theae are measures dealing wlth^the, frequency of behavlpr repetition 
Example 3«1: 



^the frequency of behavlpr repetition* 

I ■ I 



thin a 2^ minute t: 



Within a Tp minute time perlocl, observe the number of times a 
teacher does the fo-l lowing things (put a check for each occurrence); 

TaUles ^ Total 

Asks recall question " / ^ 

. ' Asks student to read * . ^ y ^ 

Provide^ feedback, . • y 

\ . ^ ^ 

Exdlnpre 3.2: * * ^ 

(Flj;^tv^fck<'ln-basket slTHulatlon) ♦ Within a -one hour time limit t-^ 
observe the number of 4:lmes an examinee performs ^ach of the 
, . following things (put a check for each occurrence) : 

. " Tallies Total 

Reads something from ln--basket . 

Dictates memorandum to subordinate 

Dictates lettpr^'to c/lent 

Drafts a personal-memorandum 

Puts Information back Into In-basket ' 

, 4/ Amount Achieved or Accomplished 

I \ ' ^ ■ V f 

These measures deal with the amount of output produced by an 

examinee • * 

' Example 4.1: * . 

Number of words type^d. In 5 minutes. ' • 

^Example 4.2: , " ^ ^ 



^ Number of telephone Inquiries handled In one^hour. 
^^^^Number of times supervisor helps with an Inquiry. 
Example 4.3: ' 
Wickets packaged In a 15 minute time period:. 



(Directions: -Tally the packaged wickets . 0-10 

and check the, appropriate line.) * ' * 11-15 

\ 16-20 

• ' . ^^ 21-^25 



26-30 ^ 
over 30 



For scoring there could bg a 0-5 scale, i*e^, 0-10 - 0; 
U-15 - 1; . . over 30- 5. - 

5. CoQsumptlon or Quantity Used 



^^I'hese are measures dealing with the resources expended in perform- 



ance* Ofteii these measurements can easily be done In^ an unobtrusive 
mamier* ' — — — ^ 



Example 5.1: 



■ / 

ttt-*(ficder to check driving habits onfe might keep records 

on the number of replacement tires a delivery person requli?es 

each year and check 14: against miles driven. 

<^ 

Example 5*2: 

In order to check for efficient use of using electrical wire 
1 for a Blmulated routine telephone Installatiot^ the test 
constructor could set standards for liraximal effective use of . 
wire; measure the amount of wire remaining after perfoxiuance 
and check against measurement taken before performance, e.g.. 

Length of wire at start: • 

Length of wire at finish: • 

Length of wire used; - 

/ . ■ . 

Comment: This technique can be used for a variety of other endeavors* 

• : - \ . ' ■ 

!For examp^le. the skills test could measure the amount of 
. / computer time us^ed, the amount of telephone usage^ the amount 
u ' / of secretarial time^ used , etc • 



Subjective Measurement 

Subjective mea&ures ate uded to classify complex processes or 

« 

products into predetermined categories* The categories force' the observer/ 
scorer to make discrete decisions in regard to ^performances* 



52 



6. Rating ScAlea 

Rfitlng scales classify oxamlnec performance on a continuum of 
predetermined^, categories . 

Example 6.1: , » ' 

When answering the telephone this secretary Is 



1. overly friendly 

2. courteous and professional 

3r courteous but not very helpful 
A. not very courteous but very helpful 
neither ^courteous nor helpful 



Example 6«2: 



/ 



Please rate eicamlnee performance lij the four areas below by placing 
a. In the columns corresponding to your ratings. ' 



Area 

Typing letters 

Taking dictation 

Editing manuscript 

Keeping accounts accuisately 



Unac- Does 
cept- Not 

Eiccellent Good O.K. Poor able Apply 



7. torcfed Choice 



Forced choice scoring Is similar to ra,tlng scales except that 
the scoring Is done on an "all or none" basis. " ^ 



Example 7*1: 



Examinee took the patient-^ s blood pressure.'' 
(Circle one) 

4^ 



Yes / No / N/A 

or . / 

N/A / Did Do / Did Not Do 



Example 7.2: 



The sales order, was fyllled correctly. 
(Circle one)^/ \ 



/ 



V 



Yes 



No 



£xample 7«3: 

For checking a aeries of steps a form like the one below might be uaed.^ 



• 1 


% 

» 


y 




Yes 


No 


2 . . 

3 . . 
A . . 


... # 


/ 


r 




• 















8, Checklists 



Checklists are used to record the occurrer^ce of a bet of prespeclfied behavlor%. 
^oiAetimes^^eck^ists are called "cafetetia" questions because the ol?t 
checiks off what occurs fro'ta a variety of choices — none of which necessarily 

47 ' 

exclude other iteme^. . ' * • f , 

- ■ % ^ . 

Example 7.1: ' : * ' 

Check all that apply t^o this waitress slmulatioi 

Served water ^ _ 
Asked if cocktails were desired 
Obtained Gocktails from bartender 
. Gdmished cocktails 
'Correctly re^tumed cocktails to 

persons qrdering them \ ^ 

Passed out menus 



Ex;ample 7*2: ' 

Check all that apply to this teacher ' e day: 



Took attendance 
« 

Collected lunch money 
Conducted two reading groups 
Had one hour of ma tlr "Instruct Ion 
Had 'atudenta at lunch on time 



\ 



9. Attitude Scales 

These measures deal with examinee attitudes toward Important 
elements of their environment. There Is a wealth of literature on 
constructing and using attitude scales. 
ExamiSle 9*1: 



I 

I think production deadlines are 



a; of overriding Importance* j 
b. very Important as guidelines for prodtL|yElon. ' 
c» useful but not too Important. 
d. not particularly useful. 



Example 9.2: ^ . 

Reading technical literature In my field Is , 



a. very Important to me. 
t * b. of some importance to me. 

c. nqt Important. 

• \ ' 

Example 9.3: 

Math classes are ray favorite* time 
during the school day* 



SA 



.Example 9^4: 



r 



N 



D 



SD 



For the type of work I plan to do, I feel library skills are 



-essential 



somewhat 
important 



useful, bu^ 

' not 
liAportant * 



not 
important 



unimportant 



10. Behavior Categorization 'V . * / 

These measures deal with categorizing behaviors or the results of 

acts that have occurred.. \^ » / 

Example 10.1: ^ . 

Answers the telephone in a cordial manner ^(Check one) 
'. very cordial friendly ^ ~top' abrupt 



-C8- y 

Example 10.2: , 

Completed the s^iLe, (Circle one) Yes Unsure No 

Example 10.3: * 

Ability to work with subordinates. (Check one) 

very ^ . } somewhat 

effectively effectively effectively inefcfectively 




Appendix D 



Review of Non-Objective Items and Scoring 



\ 



57 




Figure D.l Evaluation of Non-Objeptlve Itcmo 



Crltcylog o f Appropr tateaeaa : 

!• 1b pcrforotince of this oklXI necessary to 
Job Aucccsn? (In other vord3. Will there 
bo trouble" If this element is Ignored?) 

2m Ic the element necessary for barely 
acceptable workers? 

3. Will this elcroeat differentiate superior 
workers from those who are not? 

A* Is it practical to expect the examinee to 
perform this skill at this point? 

5. Has performance of this skill been deemed 
important vlS'-a-vis a validated job 
analysis^ 



Yes 



No Unsurt: 



Item (Task) Content ; 

!• Does the task have a clear and logical 
bcjginning? y 

2* Does the task have a clear and logical 
end? • \^ 

3* Does the task isolate the skills which 
are of interest? « 

. A. Is the reading level appropriate for 
potential examinees? 

5. Has the item been wade excessively 
difficult by requiring unnecessarily exact 
or difficult operations? 

6. Does the item give any cpntingencles that 
would unnecessarily inhibit completion? 

7. Does the item present material on which 
the student ha$ received Instruction? 

9* Is the item drawn from a validated test 
blueprint? 

9» Can the skill be adequately performed in 
q given length pf time? 

10. If a product is to be evaluated are the 
expectations (specifications) delineated? 



/ 



4- 



BEST COPY AVAILABLE 

■ 58 



Itom (Task) Structure d 

!• 1ft tho tnsk dcllneatod in an 
unnmblguouu fashion? 

2* Id the itom constructed in terminology 
commonly used in the trade or pro- 
fession? ■* 



/ 



/ 



No 



Unsure 



3* po the directions give too many cues 
for proper task procedures? 

4. Are the task directions stated a^ 
concisely as possible? 

5* Are the task directions clear? 



6* Does the item clearly specify what the 
examinee has to* do? 



Response Content; 

1* Is there one clearly best way to 
execute the task? 
* 

2* Are there a variety of acceptable ways s 
to execute the task? 

3. Will examinees who have received 
training be able to select the 
appropriate procedure? 

A* Could an examinee who has not received 
training execute the task? 

3* Is the desired precision of performance 
clearly indicA$ed in the item? 



4 • 



jjos^gfte S tructu re : 

— «^ ^ 

1. Are the Appropriate* tools or work aids 
available to the examinee? 

2. Are the tooli* and work aids in good " 
condition? 



3* Is the test environmen 
^ good pcrformancQ? 



t coofluc 



lucivc to 



" ^M^rectiona : 

1. Is thb examinee informad of ^he fidelity 
vh^ch is Expected? 

2. Do the 'directions inform tho examinee how 
responses will be scored? 

3. Do the directiVis infom the cxainlnee 
about the purposes of the test? 

4. Do the directions spexify whether there 

is only one best procedure? v 

5. Do the directions specify whether there 
are a variety of acceptable procedures? 

6. Do the directions specify an appropriate 
amount of time which should be spent on 
the tasks? ' 

7. Do the directions specify ariy differential 
weiighting procedures which will be used 
l.n scoring the test? ' ^ 

8. Do the directions attempt toT^cduce 
examinee tension? 



Post-Item (Task) Sel ecti on Considerations : 

1. Do the items represent an adequate oarap]e 
of the test bOiueprlnt? 

2. Are the perfonaaKkcea appfoptiato to the 
actual Job? 



-DA' 



3. Will the uampliiii; of di f fi-ront (luUtary) 
procedures be confur»lnc to the examinee? 

4. Arc there mechanifimii to allow the examinee 
to proceed after poor porfonnaacc on one 
task? 



Ycr, 



No 



Unaure 




61 



* M 



Figure Scoring a Non-Objective Test 



Sc orinfi^egfonn an ce Items 

^1, For each task han the correct procedure, 
or the acceptable alternative been 
delineated? ^ 

2. Are there provisions made for partial 
credit whore appropriate? 

3/ Has the manner in which performances will - 
bo ranked^^ rated or categorized been 
identified? 

4, When observer judgments are ysed arc there 
sample responses to represent the several 
possible categories? 

5* Does the scoring system provide for unex- 
pected performance? 

6. Has a scoring key been prepared? 

7. Have arrangements been made to have -'^ 
observers at the test site? 

■ i 

^. Are the observers likely to be personally 
j>lased due to prior inc.iE!r- 
^ action with the cxarainoes? 

v.. 

9» Will people Who hav^ mai^tery in the per- 
formance area be scoring the tests? 



10. Will people who have -mas^tlSry in the per- 
formance area be judging performances? 

11. Is there adequate provision ^f or training 
^observers? ^ 

12. Has there been clear attempts to mini^ze^ 
obsprvers making judgmental docisions\^ 

13. Will t\\^ presence of the obseryer(sX effect 
performance? , \ | 



Yes 



No 



Unfiure 



/ 



V 



63 



ERIC 



A, 



Table B,l Types of Tests 



f^liAi*Ar* t" or f fir 

V 


Recognition 

Toe t « 


Simulation 
Tests 


Work-Sample 
Tests 


Projoct/Procuct 
Tests 


Usefol Situations for 
Application 

• 


!♦ l.irgeVg roups of 
exnJTiincos / 

1^ ccoxymy is ^ 
important 

♦ 


1. where the situa- 

tion is too large 
and amorphous to 
have "real" 
situation 

2. factors under con- 

sideration must 

be limited 
• 

. 3« where health or 
safety is a 
factor 


1. when on-thc--job 

observation is ^ 
possible 

2. where the work in 

question can be 
accurately ob-- 
served 

3. primarily used with 

skilled br seml.- 
okillcd workers 


1. where process is 

not important 

2. where a variety 

of processes 
are acceptable 
3/ wlien test devel- 
opment and ad- 
ministration 
costs arc 
limited 


Examples ' ^ 

\ 


1. identify parts 
of Q diagram 

2* point to speci- 
fied components 

3* identify func-- 

tions of various 


1. role playing 
2« games (computer 

• & otherwise) 

,ln-basket 
A* secretarial tests 


1, troubleshoot and 

repair 

2, production out- 

put . e.g. » 
machinist > 
secretary 


1. artistic pro- 

jects 

2. sports contests 

3. science fairs 


7al;^dlty for 
Detci^inlng 
Proficiency 


- 1, low for skills 
2. hif^h for know- 

. ^^^^^ 


1« moderate/higli 

for skills ^ 

2. mode rate /Tow Jfor 
knowledge 


1. high for ski-lio 

2, moderate for 

knowledge 

Si ^ — 


1* moderate/high 
for skills 

2. moderate for. 
knowlcflge 


Response Hodes 


1. paper and pencil 

-multiple choice 
-fill in bUnk 

2. oral ^ 
3» computer intcr-^ 

action 


1. paper and pencil 

2. computer inter-* 

action 

3. oral 

A, manipulative * 
_ — . i 


varies; depends on 
actual Job x'e-- - 
quirementS 


varies; the only 
response is the . 



Tabic D^l Types of Tests 



Characteristic 


Kecoi»iuLiovi 
Tests 


$itnul Virion* 

Tests 
» — _ — 


WojrU-Samplc ; 
Tests 


Projoct/i^'reduct 

Tests' . 

— . — . — . — , . <i - - " 


—m 

Scoring Modes 


1* objective 

V 


1. objective^ o,g» , 
did/did not do 
subjective, e,g, , 
observer ratings 

9 

0 


1. objective, e.g. 

output » waste, 
' '^"accuracy, etc. 

2. subjective, e.g. , 

tatlugx, scales, 
ranking, etc, 

' — ^ — 


1. objective, e,R. , 

measure t<*ler- 
ancos c pTouuvTt 
vorks/, does ubt 
work, amount 
completed, etc* 

2. subjective, 

artistic merit 


Process/Product ^ 
Evaluation 


doCR not apply 


process and/or 
product 


process and 
product 


product 


Costs 


relatively Inex- 
pensive to 
develop, admin- 
ister and score 


expensive to develop, 
administer and 
score 


exy>ensive to develop 
costs vary to admin- 
l*itGr Soften it is 
on-the-j ob aitne) 
relatively expensive 
to score 


inexpensive to 

develop 
costs vary to ad- ^ 

minister 

costs vary to score 


Fidelity 


low 


hish 


high 


moderate/high 


Useful as 
Instructional Device 


yes 


yes 


no ^ 


yes • 

• 


Comments^ 


* J 


The test con- 
structor must 
strive for maximum 
fidelity within 
allotted resources 


V 

^ —4- 


r 

\ 



64 



FRir 



